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Despite the identification of severe acute respiratory syndrome-related coronavirus (SARSr-CoV) in Rhi¬ 
nolophus Chinese horseshoe bats (SARSr-Rh-BatCoV) in China, the evolutionary and possible recombination 
origin of SARSr-CoV remains undetermined. We carried out the first study to investigate the migration pattern 
and SARSr-Rh-BatCoV genome epidemiology in Chinese horseshoe bats during a 4-year period. Of 1,401 
Chinese horseshoe bats from Hong Kong and Guangdong, China, that were sampled, SARSr-Rh-BatCoV was 
detected in alimentary specimens from 130 (9.3%) bats, with peak activity during spring. A tagging exercise of 
511 bats showed migration distances from 1.86 to 17 km. Bats carrying SARSr-Rh-BatCoV appeared healthy, 
with viral clearance occurring between 2 weeks and 4 months. However, lower body weights were observed in 
bats positive for SARSr-Rh-BatCoV, but not Rh-BatCoV HKU2. Complete genome sequencing of 10 SARSr- 
Rh-BatCoV strains showed frequent recombination between different strains. Moreover, recombination was 
detected between SARSr-Rh-BatCoV Rp3 from Guangxi, China, and Rfl from Hubei, China, in the possible 
generation of civet SARSr-CoV SZ3, with a breakpoint at the nspl6/spike region. Molecular clock analysis 
showed that SARSr-CoVs were newly emerged viruses with the time of the most recent common ancestor 
(tMRCA) at 1972, which diverged between civet and bat strains in 1995. The present data suggest that 
SARSr-Rh-BatCoV causes acute, self-limiting infection in horseshoe bats, which serve as a reservoir for 
recombination between strains from different geographical locations within reachable foraging range. Civet 
SARSr-CoV is likely a recombinant virus arising from SARSr-CoV strains closely related to SARSr-Rh-BatCoV 
Rp3 and Rfl. Such frequent recombination, coupled with rapid evolution especially in ORF7b/ORF8 region, in 
these animals may have accounted for the cross-species transmission and emergence of SARS. 


Coronaviruses can infect a wide variety of animals, causing 
respiratory, enteric, hepatic, and neurological diseases with 
different degrees of severity. On the basis of genotypic and 
serological characteristics, coronaviruses were classified into 
three distinct groups (2, 20, 54). Among coronaviruses that 
infect humans, human coronavirus 229E (HCoV-229E) and 
human coronavirus NL63 (HCoV-NL63) belong to group 1 
coronaviruses and human coronavirus OC43 (HCoV-OC43), 
and human coronavirus HKU1 (HCoV-HKUl) belong to 
group 2 coronaviruses, whereas severe acute respiratory syn¬ 
drome-related coronavirus (SARSr-CoV) has been classified 
as a group 2b coronavirus, distantly related to group 2a, and 
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the recently discovered group 2c and 2d coronaviruses (6, 8,10, 
18, 31, 38, 43, 46, 49, 50). Recently, the Coronavirus Study 
Group of the International Committee for Taxonomy of Viruses 
has proposed renaming the traditional group 1, 2, and 3 corona¬ 
viruses Alphacoronavirus, Betacoronavirus, and Gammacoronavi- 
rus, respectively (http://talk.ictvonline.Org/media/p/1230.aspx). 

Among all coronaviruses, SARSr-CoV has caused the most 
severe disease in humans, with over 700 fatalities since the 
SARS epidemic in 2003. Although the identification of SARSr- 
CoV in Himalayan palm civets and raccoon dogs in live animal 
markets in southern China suggested that wild animals could 
be the origin of SARS (11), the presence of the virus in only 
market or farmed civets, but not civets in the wild, and the 
rapid evolution of SARSr-CoV genomes in market civets sug¬ 
gested that these caged animals were only intermediate hosts 
(24, 39, 42, 52). Since bats are commonly found and served in 
wild animal markets and restaurants in Guangdong, China 
(47), we have previously carried out a study of bats from the 
region and identified a SARSr-CoV in Rhinolophus Chinese 
horseshoe bats (SARSr-Rh-BatCoV) (21). Similar viruses have 
also been found in three other species of horseshoe bats in 
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mainland China (25), supporting the hypothesis that horseshoe 
bats are a reservoir of SARSr-CoV. Recently, viruses closely 
related to SARSr-Rh-BatCoV in China were also reported in 
Chaerophon bats from Africa, although only partial RNA-de- 
pendent RNA polymerase (RdRp) sequences were available 
(41). In addition, more than 10 previously unrecognized coro- 
naviruses of huge diversity have since been identified in bats 
from China and other countries (1, 3, 5, 9, 22, 27, 32, 33, 40, 46, 
51), suggesting that bats play an important role in the ecology 
and evolution of coronaviruses. 

As a result of the unique mechanism of viral replication, 
coronaviruses have a high frequency of recombination (20). 
Such a high recombination rate, coupled with the infidelity of 
the polymerases of RNA viruses, may allow them to adapt to 
new hosts and ecological niches (12, 48). Recombination in 
coronaviruses was first recognized between different strains of 
murine hepatitis virus (MHV) and subsequently in other coro¬ 
naviruses, such as infectious bronchitis virus, between MHV 
and bovine coronavirus, and between feline coronavirus type I 
and canine coronavirus generating feline coronavirus type II 
(12, 16, 17, 23). Recently, by complete genome analysis of 22 
strains of HCoV-HKUl, we have also documented for the first 
time that natural recombination events in a human coronavirus 
can give rise to three different genotypes (48). 

Although previous studies have attempted to study the pos¬ 
sible evolutionary and recombination origin of SARSr-CoV, 
no definite conclusion can be made on whether the viruses 
from bats are the direct ancestor of SARSr-CoV in civets and 
humans, given the paucity of available strains and genome 
sequences. To better define the epidemiology and evolution of 
SARSr-Rh-BatCoV in China and their role as a recombination 
origin of SARSr-CoV in civets, we carried out a 4-year study 
on coronaviruses in Chinese horseshoe bats in Hong Kong and 
Guangdong Province of southern China. Bat tagging was also 
performed to study the migration pattern of bats and viral 
persistence. The complete genomes of 10 strains of SARSr- 
Rh-BatCoV obtained at different time were sequenced and 
compared to previously sequenced genomes. With the avail¬ 
ability of this larger set of genome sequences for more accurate 
analysis, recombination and molecular clock analyses were 
performed to elucidate the evolutionary origin and time of 
interspecies transmission of SARSr-CoV. 

MATERIALS AND METHODS 

Sample collection and bat tagging. Sample collection was approved by and 
performed in collaboration with the Department of Agriculture, Fisheries and 
Conservation (AFCD) of the Hong Kong Special Administrative Region 
(HKSAR) and Guangzhou Center for Disease Control and Prevention, Guang¬ 
zhou, China. Chinese horseshoe bats {Rhinolophus sinicus) were captured from 
various locations in Hong Kong and in Guangdong Province of southern China 
over a 4-year period (April 2004 to March 2008). Respiratory and alimentary 
specimens of the bars were collected using procedures described previously (21, 
53). All specimens were placed in viral transport medium before transportation 
to the laboratory for RNA extraction. To assess the migration range and chro- 
nicity of coronavirus infections, 511 bats from Hong Kong were also tagged 
during sample collection before release. Tagged bats, when identified in subse¬ 
quent site visits, were recaptured and recorded for sample collection before 
release. 

RNA extraction. Viral RNA was extracted from the respiratory and alimentary 
specimens using QIAamp viral RNA minikit (QIAgen, Hilden, Germany). The 
RNA was eluted in 50 (jlI of AVE buffer and was used as the template for reverse 
transcription-PCR (RT-PCR). 
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RT-PCR for coronaviruses and DNA sequencing. Coronavirus screening was 
performed by amplifying a 440-bp fragment of the RdRp gene of coronaviruses 
using conserved primers (5'-GGTTGGGACTATCCTAAGTGTGA-3' and 5'- 
CCATCATCAGATAGAATCATCATA-3') designed by multiple alignments of 
the nucleotide sequences of available RdRp genes of known coronaviruses (49). 
Reverse transcription was performed using the Superscript III kit (Invitrogen, 
San Diego, CA). The PCR mixture (25 |xl) contained cDNA, PCR buffer (10 mM 
Tris-HCl [pH 8.3], 50 mM KC1, 3 mM MgCl 2 , and 0.01% gelatin), 200 fxM (each) 
deoxynucleoside triphosphates (dNTPs), and 1.0 U Taq polymerase (Applied 
Biosystems, Foster City, CA). The mixtures were amplified in 40 cycles of PCR, 
with 1 cycle consisting of 1 min at 94°C, 1 min at 48°C, and 1 min at 72°C, and 
a final extension step of 10 min at 72°C in an automated thermal cycler (Applied 
Biosystems, Foster City, CA). Standard precautions were taken to avoid PCR 
contamination, and no false-positive result was observed for the negative con¬ 
trols. 

The PCR products were gel purified using the QIAquick gel extraction kit 
(QIAgen, Hilden, Germany). Both strands of the PCR products were sequenced 
twice with an ABI Prism 3700 DNA analyzer (Applied Biosystems, Foster City, 
CA) using the two PCR primers. The sequences of the PCR products were 
compared with known sequences of the RdRp genes of coronaviruses in the 
GenBank database. 

All cDNAs positive for SARSr-Rh-BatCoV were subjected to Rh-BatCoV 
HKU2 screening using Rh-BatCoV HKU2-specific primers (5'-GGAGTATGC 
AGCGTTGGGTTA-3' and 5'-GACACATAGCGCTCAAGCAAA-3'), and all 
cDNAs positive for Rh-BatCoV HKU2 were subjected to SARSr-Rh-BatCoV 
screening using SARSr-Rh-BatCoV-specific primers (5'-CAAGTGGGGTAAG 
GCTAGACTTT-3' and 5' -AACATATTATGCCAGCCACCATA-3') using the 
PCR conditions described above. 

Statistical analysis. Comparison of the body weights of bats in different groups 
was performed using Student’s t test and covariate analysis (SPSS version 11.5). 
A P of <0.05 was regarded as statistically significant. 

Complete genome sequencing of SARSr-Rh-BatCoV. Ten complete genomes 
of SARSr-Rh-BatCoV detected in the present study were amplified and se¬ 
quenced using the RNA extracted from an alimentary specimen as the template. 
The RNA was converted to cDNA by a combined random priming and oligo(dT) 
priming strategy. The cDNA was amplified by degenerate primers as described 
previously (21). A total of 57 sets of primers, available on request, were used for 
PCR. The 5' end of the viral genome was confirmed by rapid amplification of 
cDNA ends (RACE) using the 573' RACE kit (Roche, Germany). Sequences 
were assembled and manually edited to produce the final sequences. 

Genome analysis. The nucleotide sequences of the genomes and the deduced 
amino acid sequences of the open reading frames (ORFs) were compared to 
those of other coronaviruses using the CoVDB coronavirus database (14). Phy¬ 
logenetic tree construction was performed using the neighbor-joining method 
with ClustalX 1.83. 

Bootscan analysis. Sliding window analysis was used to detect possible recom¬ 
bination, using nucleotide alignment of the available genome sequences of dif¬ 
ferent SARSr-Rh-BatCoV strains and civet SARSr-CoV SZ3 generated by 
ClustalX version 1.83 and edited manually. Bootscan analysis was performed 
using Simplot version 3.5.1 (26) (F84 model; window size, 1,500 bp; step size, 300 
bp) with selected strains, including SARSr-Rh-BatCoV Rfl and civet SARSr- 
CoV SZ3, as the query sequence. 

Estimation of synonymous and nonsynonymous substitution rates. The num¬ 
ber of synonymous substitutions per synonymous site, Ks, and the number of 
nonsynonymous substitutions per nonsynonymous site, Ka, for each coding re¬ 
gion were calculated for all available SARSr-Rh-BatCoV, civet SARSr-CoV, and 
human SARSr-CoV genomes using the Nei-Gojobori method (Jukes-Cantor) in 
MEGA 3.1 (19). Identical genes were excluded from analysis. 

Estimation of divergence dates. The time of the most recent common ancestor 
(tMRCA) and the time of divergence were estimated on the basis of an align¬ 
ment of ORF1 sequences, using the uncorrelated exponentially distributed re¬ 
laxed clock model (UCED) in BEAST version 1.4 (7). Under this model, the 
rates were allowed to vary at each branch drawn independently from an expo¬ 
nential distribution. The sampling dates of all strains were collected from the 
literature or from the present study and were used as calibration points. De¬ 
pending on the data set, Markov chain Monte Carlo (MCMC) sample chains 
were run for 1 X 10 8 states, sampling every 1,000 generations under the GTR 
nucleotide substitution model, determined by MODELTEST and allowing y-rate 
heterogeneity for all data sets. A constant population coalescent prior was 
assumed for all data sets. The median and the highest posterior density regions 
at 95% (HPD) were calculated for each of these parameters from two identical 
but independent MCMC chains using TRACER 1.3 (http://beast.bio.ed.ac.uk). 
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FIG. 1. Map showing the locations of bat sampling and tagging in Hong Kong. Squares represent the locations where bats were positive for 
SARSr-Rh-BatCoV, dark circles represent locations where bats were positive for Rh-BatCoV HKU2, and triangles represent locations where the 
bats were positive for both Rh-BatCoV HKU2 and SARSr-Rh-BatCoV. The percentages indicate the proportion of bats positive for SARSr-Rh- 
BatCoV, Rh-BatCoV HKU2, or SARSr-Rh-BatCoV/Rh-BatCoV HKU2 at each location. Blank circles represent locations negative for SARSr- 
Rh-BatCoV and Rh-BatCoV HKU2. The red circle represents the location of Shenzhen Dongman market (SZDM) in China where civet 
SARSr-CoV was first identified. The arrows indicate the direction of migration of Chinese horseshoe bats as demonstrated in the tagging exercise. 


The tree was annotated by TreeAnnotator, a program of BEAST and displayed by 
FigTree (http://tree.bio.ed.ac.uk/software/figtree/). 

Nucleotide sequence accession numbers. The nucleotide sequences of the 10 
genomes of SARSr-Rh-BatCoV have been lodged within the GenBank sequence 
database under accession no. GQ153539 to GQ153548. 


RESULTS 

Epidemiology of SARSr-Rh-BatCoV in Chinese horseshoe 
bats. A total of 1,398 respiratory specimens and 1,648 alimen¬ 
tary specimens from 1,337 and 64 Chinese horseshoe bats were 
obtained from Hong Kong and in Guangdong Province in 
southern China, respectively, over the 4-year study period. 
RT-PCR of a 440-bp fragment in the RdRp genes of corona- 
viruses was positive for SARSr-Rh-BatCoV in respiratory 
specimens from 2 of the 1,337 Chinese horseshoe bats from 
Hong Kong and in alimentary specimens from 126 (9.4%) of 
the 1,337 Chinese horseshoe bats from Hong Kong and 4 
(6.3%) of the 64 Chinese horseshoe bats from Guangdong, 
China, with >99% nucleotide identities to SARSr-Rh-BatCoV 
(GenBank accession no. DQ022305) (21). Another previously 
described group 1 coronavirus, Rhinolophus bat coronavirus 
HKU2 (Rh-BatCoV HKU2), coinfecting Chinese horseshoe 
bats, was identified in alimentary specimens from 62 (4.6%) 
bats from Hong Kong and from 7 (10.9%) bats from Guang¬ 
dong, China, with >99% nucleotide identities to Rh-BatCoV 
HKU2 (GenBank accession no. DQ249235) (22). Seventeen 
bats from Hong Kong and three bats from Guangdong, China, 


were coinfected by SARSr-Rh-BatCoV and Rh-BatCoV 
HKU2. The 126 bats from Hong Kong positive for SARSr-Rh- 
BatCoV were from 15 of the 27 sampling locations in Hong 
Kong, with bats from seven locations harboring both viruses 
(Fig. 1). Peak activity of both SARSr-Rh-BatCoV and Rh- 
BatCoV HKU2 was observed in the spring (see Fig. SI in the 
supplemental material). However, the prevalence of SARSr- 
Rh-BatCoV was higher than that of Rh-BatCoV HKU2 during 
the spring of 2005 and 2007, while the prevalence of Rh- 
BatCoV HKU2 was higher than that of SARSr-Rh-BatCoV in 
the spring of 2006. 

A total of 511 Chinese horseshoe bats from 11 sites were 
tagged, with 152 (29.7%) recapturing episodes from six sites 
during subsequent visits (Fig. 2). A total of 113 tagged bats 
were recaptured, with 84 bats recaptured once, 21 recaptured 
twice, 6 recaptured three times, and 2 recaptured four times 
after tagging. The time interval between tagging and recapture 
of the same bat ranged from 2 weeks to 21 months. Migration 
between water tunnels at short distances was most common 
(Fig. 1). The longest distance of migration was approximately 
17 km within 3 months from tagging to recapture (October 
2006 to January 2007), while the shortest distance between two 
habitats was 1.86 km. Sixteen bats were positive for SARSr- 
Rh-BatCoV, and 23 were positive for Rh-BatCoV HKU2 at 
the time of tagging, with one bat being positive for both vi¬ 
ruses. Among these 38 bats, 10 bats were recaptured, but all 
were subsequently negative for coronaviruses within a period 
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FIG. 2. Schematic diagram showing the number of Rhinolophus sinicus (Chinese horseshoe) bats tagged and recaptured, and the presence of 
coronaviruses among these tagged bats. The numbers in boldface type indicate the number of bats successfully recaptured. The numbers in roman 
type (not boldface type) following dashed lines are the numbers of bats not recaptured in subsequent visits. The numbers in parentheses are the 
number of recaptured bats positive for coronaviruses. CoV +ve, coronavirus positive; CoV -ve, coronavirus negative. 


of 4 to 16 months (Fig. 2). Twenty-three and nine bats initially 
negative for coronaviruses were subsequently positive for 
SARSr-Rh-BatCoV and Rh-BatCoV HKU2, respectively, 
among which seven bats were positive for both viruses. How¬ 
ever, only one of these bats was positive for coronavirus at 
more than one episode, which carried SARSr-Rh-BatCoV at 
first and both SARSr-Rh-BatCoV and Rh-BatCoV HKU2 
during the next visit at the same site 2 weeks later (Fig. 2). 
With the longest shedding period of 2 weeks and the shortest 
documented clearance time of 4 months in bats positive for 
SARSr-Rh-BatCoV, it is estimated that viral clearance oc¬ 
curred between 2 weeks and 4 months. 

No disease association was observed in bats positive for 
SARSr-Rh-BatCoV or Rh-BatCoV HKU2. However, lower 
body weights were observed in bats positive for SARSr-Rh- 
BatCoV (body weight [mean ± standard deviation {SD}], 
10.9 g ± 1.4 g) than those negative for coronaviruses (body 
weight [mean ± SD], 11.6 ± 2.2 g) (P < 0.0001 by Student’s t 
test). A similar phenomenon was not observed when bats pos¬ 
itive for Rh-BatCoV HKU2 (body weight [mean ± SD], 11.5 ± 
1.5 g) were used for comparison (P = 0.783 by Student’s t test). 
To control for the confounding effect of age and possible lower 
body weights after hibernation during which SARSr-Rh- 
BatCoV showed the highest detection rate, covariate analysis 
was performed using only data from the peak season (during 
March) with forearm lengths (which correlate with age) as a 


possible cofactor. The results showed that the SARSr-Rh- 
BatCoV carriage state is an independent factor in association 
with lower body weight (P = 0.002). Similarly, no significant 
difference in body weight was observed when similar analysis 
was performed on bats positive for Rh-BatCoV HKU2 despite 
its similar seasonality, suggesting that this phenomenon is spe¬ 
cific to SARSr-Rh-BatCoV. 

Complete genome comparison of SARSr-Rh-BatCoV ge¬ 
nomes. In addition to the eight previously described genomes 
of SARSr-Rh-BatCoVs, complete genome sequence data of 10 
additional strains of SARSr-Rh-BatCoV were obtained by as¬ 
sembly of the sequences of the RT-PCR products obtained 
directly from 10 individual specimens collected at different 
times. Eight strains were detected in bats from Hong Kong, 
while two strains were from bats from Guangdong, China (see 
Table SI in the supplemental material). Their genome sizes 
were 29,677 to 29,716 nucleotides, with a G+C content of 
41%, comparable to the previously reported genomes. The 
eight Hong Kong strains were more closely related to each 
other with an overall nucleotide identity of 99.9%, while the 
two strains from Guangdong, China, had 98.5% nucleotide 
identity to the Hong Kong strains. Except for strain HKU3-8 
from Guangdong, China, all SARSr-Rh-BatCoV strains share 
the same genome organization, containing the putative tran¬ 
scription regulatory sequence (TRS) motif, 5'-ACGAAC-3', at 
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the 3' end of the 5' leader sequence and preceding each ORF 
except ORF7b. 

Similar to previous findings, analysis of the full-length se¬ 
quences of all currently available SARSr-Rh-BatCoV genomes 
showed that the major differences between SARSr-Rh-BatCoV 
genomes and civet/human SARSr-CoV genomes were ob¬ 
served in spike (mainly SI domain), ORF3, and ORF8 regions, 
which were also the most variable regions among human 
SARSr-CoV genomes (21, 25, 34). All genomes possessed 87% 
nucleotide identities to civet and human SARSr-CoV, except 
for SARSr-Rh-BatCoV strain Rp3, which possessed 91% and 
92% nucleotide identities to civet and human strains, respec¬ 
tively. The higher overall sequence similarity of strain Rp3 to 
civet and human strains is mainly due to the higher sequence 
homology within the ORF1 region. At nsp2, nsp3, nspl2, and 
nspl4 regions, strain Rp3 possessed the highest amino acid 
identities among all SARSr-Rh-BatCoV strains to the corre¬ 
sponding regions in civet SARSr-CoV SZ3 (see Table S2 in the 
supplemental material). Interestingly, the sequences of our 
SARSr-Rh-BatCoV strains from Hong Kong and Guangdong, 
China, possessed higher (98%) amino acid identities in the 
npsl region to civet SARSr-CoV than other strains from China 
(92 to 93%). On the other hand, at ORF3a and ORF8, the 
sequences of strains Rfl and 273/04 possessed the highest 
amino acid identities to those of civet SARSr-CoV (see Table 
S2 in the supplemental material). 

ORF8 represents the most variable region within the 
SARSr-CoV genomes. In contrast to human SARSr-CoV 
which contains a 29-bp deletion at ORF8 region which resulted 
in two overlapping ORFs, ORF8a and ORF8b, all SARSr-Rh- 
BatCoV genomes except that of strain HKU3-8 contain a sin¬ 
gle long ORF8, similar to civet SARSr-CoV. This 29-bp dele¬ 
tion present only in human strains has been shown to disrupt 
the functional expression of the ORF8 region (30). Strain 
HKU3-8 has a short deletion at the ORF8 region that breaks 
this ORF into three small ones (see Fig. S2 in the supplemental 
material). This 26-bp deletion was only 14 bp downstream of 
the 29-bp deletion in human SARSr-CoV, suggesting that this 
region is a frequent site for deletions. As a result of the fre¬ 
quent deletions observed within this region, the ORFS region 
of SARSr-Rh-BatCoV possessed very low (35 to 37%) amino 
acid identities to that of civet SARSr-CoV, except for two 
strains, Rfl and 273/04, which possessed 80% amino acid iden¬ 
tities (see Table S2 in the supplemental material). 

Phylogenetic analysis. Phylogenetic trees were constructed 
using the nucleotide sequences of the nonstructural protein 3 
(nsp3), RdRp, spike (S), ORF3a, envelope protein (E), mem¬ 
brane protein (M), ORFS, and nucleocapsid protein (N) genes 
of SARSr-CoV (see Fig. S3 in the supplemental material). In 
general, SARSr-Rh-BatCoV strains from the same geograph¬ 
ical area are more closely related to each other. Among all 
SARSr-Rh-BatCoV genomes, strain Rp3 from Guangxi Prov¬ 
ince, China, is most closely related to human and civet strains 
in the ORF1 region, as exemplified by their close clustering in 
the nsp3 and RdRp trees. However, from the S gene onwards, 
strain Rp3 was more closely related to other SARSr-Rh- 
BatCoVs than to human and civet strains. Moreover, from 
ORF3a to ORF8, clustering of human and civet SARSr-CoVs 
with strains Rfl and 273/04 from Hubei Province, China, was 
observed. This suggested that civet SARSr-CoV may have 


arisen from recombination between strains from different geo¬ 
graphical locations that were related to present strains from 
Guangxi and Hubei provinces in China. 

Recombination analysis. To detect recombination between 
genomes of different strains of SARSr-Rh-BatCoV or civet 
SARSr-CoV, sliding window analysis was conducted. Results 
showed frequent recombination events among the bat viruses 
in China. When civet SARSr-CoV SZ3 was used as the query 
sequence with SARSr-Rh-BatCoV strains Rml, Rfl, and Rp3 
as the potential parents, a recombination breakpoint at the 
nspl6/S intergenic region was identified (Fig. 3). Upstream of 
this breakpoint before position 21300, high bootstrap support 
for clustering of civet SARSr-CoV SZ3 with SARSr-Rh- 
BatCoV strain Rp3 was observed. However, an abrupt change 
in clustering occurred after position 21300, with high bootstrap 
support for clustering of civet SARSr-CoV SZ3 with SARSr- 
Rh-BatCoV strain Rfl. This is in line with results from phylo¬ 
genetic analysis, where the ORF1 sequences of civet and hu¬ 
man SARSr-CoV strains clustered with SARSr-Rh-BatCoV 
Rp3, but the sequences from ORF3a to ORF8 clustered with 
SARSr-Rh-BatCoV strains Rfl and 273/04. 

Apart from this recombination event, other putative recom¬ 
bination events were also observed when SARSr-Rh-BatCoV 
strains were used as the query sequence. When SARSr-Rh- 
BatCoV strain Rfl was used as the query sequence with 
SARSr-Rh-BatCoV HKU3-1, 279/04, and civet SARSr-CoV 
SZ3 as the potential parents, putative recombination events 
were observed throughout the genome, as shown by frequent 
shuffle of clustering with the three putative parent strains (Fig. 
4A). The most notable site occurred at around position 20700, 
corresponding to nspl6. From positions 16400 to 20700, high 
bootstrap support for clustering with strain 279/04 was ob¬ 
served. From position 20700 onwards and especially toward 
the 3'end of the genome, high bootstrap support for clustering 
with civet SARSr-CoV SZ3 was observed. These bootscan re¬ 
sults were also supported by the shifting of positions upon 
phylogenetic analysis (Fig. 4B). From positions 16400 to 20700 
(corresponding to helicase to nspl6), SARSr-Rh-BatCoV Rfl 
was clustered with strain 279/04 with a bootstrap value of 1,000 
away from civet SARSr-CoV. From positions 20700 to 25000 
(corresponding to nspl6 to S2), it exhibited a distant relation¬ 
ship with both other SARSr-Rh-BatCoVs and civet SARSr- 
CoV. However, from position 25000 (S2) onwards, it was more 
closely related to civet and human SARSr-CoV strains than to 
other SARSr-Rh-BatCoV. Similar results were also observed 
when similar analysis was performed using strains 273/04, 279/ 
04, and Rml as the query sequence with corresponding strains 
as the potential parents. 

On the other hand, when SARSr-Rh-BatCoV strain Rfl was 
used as the query sequence with strains 273/04, Rml, and Rp3 
as the potential parents, a single recombination breakpoint 
from position 18300 to 19900 corresponding to nspl4/15 (Fig. 
5A) was observed. Before position 18300 and after position 
19900, high bootstrap support for clustering between strains 
Rfl and 273/04 was observed, whereas between these two po¬ 
sitions, an abrupt shift in phylogenetic signals occurred, with 
high bootstrap support for clustering with strain Rml. In fact, 
from phylogenetic analysis of other regions of the whole ge¬ 
nome, strain Rfl is closely related to 273/04, and SARSr-Rh- 
BatCoV Rml is closely related to SARSr-Rh-BatCoV 279/04, 
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FIG. 3. Bootscan analysis using the genome sequence of civet SARSr-CoV strain SZ3 as the query sequence. Bootscanning was conducted with 
Simplot version 3.5.1 (F84 model; window size, 1,500 bp; step size, 300 bp). SARSr-Rh-BatCoV strain Rfl, SARSr-Rh-BatCoV strain Rp3, and 
SARSr-Rh-BatCoV strain Rml were examined by bootscan analysis. 


except in this breakpoint region, where discordance of phylo¬ 
genetic positions was observed (Fig. 5B). These results suggest 
that SARSr-Rh-BatCoV strains from different bat species may 
undergo frequent recombination. 

Estimation of synonymous and nonsynonymous substitution 
rates. Using all available SARSr-Rh-BatCoV genome se¬ 
quences for analysis, except for strain HKU3-8 not used for 
ORF8 analysis, the Ka/Ks ratios for the various coding regions, 
compared to those of civet SARSr-CoV and human SARSr- 
CoV, are shown in Table 1. Notably, the Ka/Ks ratio for the S 
of SARSr-Rh-BatCoV is only 0.054, compared to that of civet 
SARSr-CoV (1.5) and human SARSr-CoV (1.0), suggesting 
that the spike gene of SARSr-Rh-BatCoV is unlikely under 
positive selection. Moreover, the Ka/Ks ratio for ORF3a, E, 
and M of SARSr-Rh-BatCoV were also markedly lower than 
that for civet and/or human SARSr-CoV. On the other hand, 
the highest Ka/Ks ratios were observed at ORF7b (0.546) and 
ORF8 (0.554), suggesting that this region is under strong pos¬ 
itive selection. 

Estimation of divergence dates. Using the uncorrelated re¬ 
laxed clock model on ORFlab, the date of the most recent 
common ancestor (MRCA) of all SARSr-CoVs was estimated 
to be 1972.39 (HPDs, 1935.28 to 1990.63), approximately 31 
years before the SARS epidemic (Fig. 6). The date of diver¬ 
gence between human or civet SARSr-CoV and the closest 
SARSr-Rh-BatCoV was estimated to be 1995.10 (HPDs, 
1986.53 to 2000.13), approximately 8 years before the SARS 


epidemic. Moreover, the MRCA date of human and civet 
SARSr-CoV was estimated to be 2001.36 (HPDs, 1999.16 to 
2002.14). The estimated mean substitution rate of the ORFlab 
data set under the UCED model was 2.82 XlCU 3 substitution 
per site per year. This estimate is comparable to a previous 
estimation using fewer SARSr-Rh-BatCoV genome sequences 
(2.79 X 1CU 3 substitution per site per year) and the estimate in 
other RNA viruses (13, 15). 

DISCUSSION 

This is the first report on the migration pattern of horseshoe 
bats in China and its relation to the epidemiology of corona- 
viruses. In this study, SARSr-Rh-BatCoV was found among 
9.4% and 6.3% of alimentary specimens from Chinese horse¬ 
shoe bats from Hong Kong and Guangdong, China, respec¬ 
tively, with some bats coinfected with a group 1 coronavirus, 
Rh-BatCoV HKU2. Both viruses showed peak activity during 
spring, with an apparent alternate biennial activity. Mating and 
feeding activity soon after hibernation in spring may have fa¬ 
cilitated the spread of the virus within the same roost and from 
roost to roost. Although no disease association could be ob¬ 
served, lower body weights were observed for bats positive for 
SARSr-Rh-BatCoV (but not for bats positive for Rh-BatCoV 
HKU2) than those negative for coronaviruses. The results of a 
tagging exercise showed that long-distance migration of Chi¬ 
nese horseshoe bats is uncommon, with the longest distance 
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FIG. 4. (A) Bootscan analysis using the genome sequence of SARSr-Rh-BatCoV strain Rfl as the query sequence (A) and phylogenetic 
analysis of its partial sequences to the corresponding regions in other SARSr-CoVs (indicated by the letters B to D above the graph). Bootscanning 
was conducted with Simplot version 3.5.1 (F84 model; window size, 1,500 bp; step size, 300 bp) on a gapless nucleotide alignment, generated with 
ClustalX. SARSr-Rh-BatCoV strain 279/04 (279), civet SARSr-CoV strain SZ3, and SARSr-Rh-BatCoV strain HKU3-1 were examined by 
bootscan analysis. (B to D) Phylogenetic trees were constructed for the regions corresponding to positions 16400 to 20700 (B), 20700 to 25000 (C), 
and 25000 to 3' end (D) by the neighbor-joining method using Kimura’s two-parameter correction, and bootstrap values were calculated from 1,000 
trees. Shaded strains represent strains included in bootscan analysis. Hel, helicase. Bars, 0.01 nucleotide substitution (B and C) or 0.005 nucleotide 
substitution (D). 


being 17 km from a northern location in fall to an eastern 
location in winter, compatible with data from other Rhinolo- 
phus species which may migrate up to 30 km for hibernation 
(28, 29). Nevertheless, such migration distances are sufficient 
for migration between Hong Kong and many areas in Shen¬ 
zhen, China, including the wild animal markets where the first 
civet SARSr-CoV was identified (Fig. 1). Such foraging ranges 
could have allowed for mixing of different SARSr-Rh-BatCoV 
strains of different geographical origins. Except for one bat 


which carried SARSr-Rh-BatCoV for at least 2 weeks, all bats 
positive for coronavirus were cleared of the same virus during 
recapture. Moreover, tagged individuals positive for SARSr- 
Rh-BatCoV were healthy during subsequent recapture, evi¬ 
dencing survival after the viral infection as reported for Euro¬ 
pean bat lyssavirus in meridional serotine bats from Spain (44). 
The present findings suggest that SARSr-Rh-BatCoV causes 
an acute, self-limiting infection associated with weight loss 
in Chinese horseshoe bats, with viral clearance occurring 
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FIG. 5. (A) Bootscan analysis using the genome sequence of SARSr-Rh-BatCoV strain Rfl as the query sequence (A) and phylogenetic 
analysis of its partial sequences to the corresponding regions in other SARSr-CoVs (indicated by the letters B to D above the graph). Bootscanning 
was conducted with Simplot version 3.5.1 (F84 model; window size, 1500 bp; step size, 300 bp) on a gapless nucleotide alignment, generated with 
ClustalX. SARSr-Rh-BatCoV strain Rml, SARSr-Rh-BatCoV strain 273/04 (273), and SARSr-Rh-BatCoV strain Rp3 were examined by bootscan 
analysis. (B to D) Phylogenetic trees were constructed for the regions before position 18300 (B), positions 18300 to 19900 (C), and after position 
19900 (D) by the neighbor-joining method using Kimura’s two-parameter correction, and bootstrap values were calculated from 1,000 trees. 
Shaded strains represent strains included in bootscan analysis. Bar, 0.01 nucleotide substitution. 


between 2 weeks to 4 months. This is compatible with our 
previous finding that the presence of neutralizing antibody 
in their sera correlated with a lower viral load in alimentary 
specimens (21). 

The present study revealed that recombination events are 
common between different SARSr-Rh-BatCoV strains from 
different species of bats and geographical locations, which may 
account for the emergence of a civet SARSr-CoV capable of 
cross-species transmission from bats to civets and from civets 
to humans. Genome sequence comparison of SARSr-Rh- 
BatCoVs from horseshoe bats and human/civet SARSr-CoV 
showed that they shared only 87 to 92% nucleotide identity. 


Therefore, genetic events, such as mutation and/or recombi¬ 
nation, would have occurred during the evolution of these 
SARSr-CoVs before the possible emergence of direct progen¬ 
itors of SARSr-CoV capable of infecting palm civets and sub¬ 
sequently humans. Reconstruction of chimeric recombinant 
SARSr-CoV using the S sequences of various SARSr-Rh- 
BatCoV found in horseshoe bats will reveal if any particular 
part of the S sequence is important for infection of civets 
and/or humans (37). In the present study, frequent recombi¬ 
nation events were identified among SARSr-CoVs. Moreover, 
civet SARSr-CoV strain SZ3 was shown to be a potential 
recombinant of SARSr-Rh-BatCoV strain Rp3 from Guangxi 
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TABLE 1. Nonsynonymous and synonymous substitutions in the coding regions of SARSr-Rh-BatCoV, civet SARSr-CoV, and human 

SARSr-CoV genomes" 


Gene 


SARSr-Rh-BatCoV 


Civet SARSr-CoV 


Human SARSr-CoV 

Ka 

Ks 

Ka/Ks ratio 

Ka 

Ks 

Ka/Ks ratio 

Ka 

Ks 

Ka/Ks ratio 

ORFl 

0.012 

0.259 

0.046 

0.001 

0.002 

0.500 

0.000 

0.001 

0.000 

ORF3a 

0.029 

0.181 

0.160 

0.004 

0.002 

2.000 

0.005 

0.004 

1.250 








0.005 

0.008 

0.625 

S 

0.023 

0.428 

0.054 

0.003 

0.002 

1.500 

0.002 

0.002 

1.000 

E 

0.007 

0.046 

0.152 




0.008 

0.004 

2.000 

M 

0.006 

0.110 

0.055 

0.001 

0.007 

0.143 

0.005 

0.003 

1.667 

ORF6 

0.011 

0.073 

0.151 

0.007 

0 


0.005 

0.011 

0.455 

ORF7a 

0.013 

0.177 

0.073 

0.001 

0.019 

0.053 

0.003 

0.012 

0.250 

ORF7b 

0.100 

0.183 

0.546 

0.010 

0.000 





ORF8 

0.215 

0.388 

0.554 

0.007 






ORF8a 







0.013 

0.019 

0.684 

ORF8b 







0.003 

0.000 


N 

0.011 

0.106 

0.104 

0.001 

0.007 

0.143 

0.002 

0.003 

0.667 


“ The number of synonymous substitutions per synonymous site (Ks), the number of nonsynonymous substitutions per nonsynonymous site (Ka), and the Ka/Ks ratio 
for each coding region were calculated for all available SARSr-Rh-BatCoV, civet SARSr-CoV, and human SARSr-CoV genomes. 


Province, China, and SARSr-Rh-BatCoV strain Rfl from Hu¬ 
bei Province, China, by both phylogenetic and bootscan ana¬ 
lyses, with the recombination breakpoint identified at the 
nspl6/S intergenic region. This suggests that civet SARSr-CoV 
may have either evolved from an ancestor that is a direct 
recombinant between strains Rp3 and Rfl or is a direct re¬ 
combinant of lineages closely related to Rp3 and Rfl that are 
yet to be identified. This finding is in line with a previous study 
showing that SARSr-Rh-BatCoV Rp3 was a potential recom¬ 
binant of SARSr-Rh-BatCoV and an unidentified lineage 


closely related to civet SARSr-CoV (13). However, civet 
SARSr-CoV was not used as the query sequence for analysis in 
this study, and no conclusion was drawn on the origin of civet 
strains. In addition, we also detected other potential recombi¬ 
nation events when different SARSr-Rh-BatCoV strains were 
used as the query sequence for analysis, with the most notable 
recombination sites found to be located at nspl6 and nsp 14/15. 
We have previously described the first evidence of natural 
recombination in a human coronavirus, HCoV-HKUl, that led 
to the generation of different genotypes, which also represents 
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FIG. 6. Estimation of the time of interspecies transmission of SARSr-CoV. Squares denote the MRCA of all SARSr-CoV (1972), the 
MRCA of human/civet SARSr-CoV and the closest SARSr-Rh-BatCoV (1995), and the MRCA of human and civet SARSr-CoV (2001), 
respectively. 
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the first report to describe a distribution of the recombination 
spots in the entire genome of field isolates of a coronavirus 
(46). Interestingly, in this report, the most significant recom¬ 
bination event was also observed at nspl6. Therefore, the 3' 
region of ORF1 and the ORF1/S junction are likely frequent 
sites of natural recombination in coronaviruses. Alternatively, 
the apparent hot spot of recombination may be due to the 
nonviability of the recombinant viruses that cross over at other 
points. In fact, recombination at the junction of the nonstruc- 
tural and structural genes essentially defines the evolution of 
the nidoviruses. In wildlife food markets and restaurants in 
southern China, a huge variety of animals of different geo¬ 
graphical origins are often caged in a crowded environment, 
which may allow cross-species viral transmission and recombi¬ 
nation events (47). Further surveillance studies of different 
horseshoe bat species from different provinces of China and 
genome analysis of their SARSr-Rh-BatCoV strains may 
reveal further evidence for the recombination origin of 
SARSr-CoV. 

With the availability of a larger set of SARSr-Rh-BatCoV 
genome sequences for analysis, the present study also attempts 
to more accurately estimate the time of emergence of SARSr- 
CoV in civets, which was shown to be only 8 years before the 
SARS epidemic in 2003. Despite the identification of SARSr- 
Rh-BatCoV in various horseshoe bat species from China, there 
has not been sufficient evidence to determine if bats are the 
host for the direct ancestor of civet and human SARSr-CoV. 
On the basis of the considerable phylogenetic distance between 
bat and civet/human strains, one study concluded that the 
current bat strains are unlikely to be the direct ancestor (34). 
A subsequent study using molecular dating analysis showed 
that the estimated date of interspecies transmission event from 
bats to an amplifying host, such as the civet, was 17 years 
(HPD, 2 to 39 years) before the SARS epidemic (45). It was 
therefore concluded that there may be a yet unidentified in¬ 
termediate host between bats and civets or unidentified 
SARSr-Rh-BatCoV strains that are even more closely related 
to civet SARSr-CoV. Another study estimated the date of 
interspecies jumping to be much more recent, approximately 
4.08 years (HPD, 1.45 to 8.84 years) before the SARS epidemic 
(13). Results from the present study are more in line with the 
latter estimate, with the date of divergence between human/ 
civet and bat strains estimated to be 8 years (HPDs, 2.9 to 16.5 
years) before the SARS epidemic. The discrepancies between 
the different estimated results may be due to the choice of 
different gene sequences and different number of strains for 
analysis. The study that concluded a much older date of diver¬ 
gence used only the helicase domain sequence for their anal¬ 
ysis, whereas the other study, similar to this study but with 
fewer strains, used the complete ORF1 sequence for analysis. 
The availability of sequences of more strains collected over a 
longer period of time may further improve the accuracy of such 
estimation. On the other hand, all three studies supported that 
SARSr-CoV are likely a newly emerged subgroup of Betacoro- 
navirus, with the median date of their MRCA estimated to be 
from 1961 to 1982 (13, 45). The emergence of diverse virus 
strains in the different Rhinolophus species within a few de¬ 
cades suggested that this novel group of coronaviruses is rap¬ 
idly evolving and may easily cross the species barrier. 

Comparison of all available complete genome sequences of 
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SARSr-Rh-BatCoVs showed that their genome sequences 
were closely related with the same genome organization, ex¬ 
cept for strain HKU3-8 from Guangdong, China, with three 
small ORFs at ORF8 region instead of a single ORF. In par¬ 
ticular, strains from the same geographical location were 
highly similar. Upon phylogenetic analysis of the individual 
ORFs, the strains from Hong Kong and Guangdong, Guangxi, 
and Hubei, China, formed separate clusters for most of the 
time, although the strains from Hong Kong and Guangdong, 
China, are more closely related, probably due to the close 
geographical proximity. However, given the frequent recombi¬ 
nation and short genetic distance among the different bat 
strains, no distinct subgroups can be classified. Similar to pre¬ 
vious studies, the most variable regions in the SARSr-Rh- 
BatCoV genomes were located in S, ORF3, and ORFS (21, 
25). In particular, the frequent deletions in ORF8 region in 
SARSr-Rh-BatCoV, together with the previously reported 
29-bp deletion in human SARSr-CoV, suggested that this is a 
frequent site for deletions in SARSr-CoV. Moreover, the rel¬ 
atively high Ka/Ks ratios observed at ORF7b and ORFS fur¬ 
ther supported that it is a region subject to rapid evolution 
under strong positive selection. The human SARSr-CoV 7b 
protein is an integral membrane protein localized in the Golgi 
apparatus and contributes to virus-induced apoptosis (35, 36). 
The human SARSr-CoV ORF8a enhances viral replication 
and induces apoptosis through a mitochondrion-dependent 
pathway, whereas the longer ORF8 protein of civet and early 
human isolates is a cleaved protein stable in the endoplasmic 
reticulum (4). Further studies are required to understand the 
biological significance of the high mutation rate in this part of 
the SARSr-CoV genomes. 
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